A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues
نویسندگان
چکیده
This paper reports on corpus-based research into the relationship between intonational variation and discourse structure. We examine the effects of speaking style (read versus spontaneous) and of discourse segmentation method (text-alone versus text-and-speech) on the nature of this relationship. We also compare the acoustic-prosodic features of initial, medial, and final utterances in a discourse segment. 1 I n t r o d u c t i o n This paper presents empirical support for the assumption long held by computational linguists, that intonation can provide valuable cues for discourse processing. The relationship between intonational variation and discourse structure has been explored in a new corpus of direction-giving monologues. We examine the effects of speaking style (read versus spontaneous) and of discourse segmentation method (text-alone versus text-and-speech) on the nature of this relationship. We also compare the acousticprosodic features of initial, medial, and final utterances in a discourse segment. A better understanding of the role of intonation in conveying discourse structure will enable improvements in the naturalness of intonational variation in text-to-speech systems as well as in algorithms for recognizing discourse structure in speech-understanding systems. 2 T h e o r e t i c a l a n d E m p i r i c a l F o u n d a t i o n s It has long been assumed in computational linguistics that discourse structure plays an important role in Natural Language Understanding tasks such as identifying speaker intentions and resolving anaphoric reference. Previous research has found *The second author was partially supported by NSF Grants No. IRI-90-09018, No. IRI-93-08173, and No. CDA-94-01024 at Harvard University and by AT&T Bell Laboratories. that discourse structural information can be inferred from orthographic cues in text, such as paragraphing and punctuation; from linguistic cues in text or speech, such as cue PHI~.ASES 1 (Cohen, 1984; Reichman, 1985; Grosz and Sidner, 1986; Passonneau and Litman, 1993; Passonneau and Litman, to appear) and other lexical cues (Hinkelman and Allen, 1989); from variation in referring expressions (Linde, 1979; Levy, 1984; Grosz and Sidner, 1986; Webber, 1988; Song and Cohen, 1991; Passonneau and Litman, 1993), tense, and aspect (Schubert and Hwang, 1990; Song and Cohen, 1991); from knowledge of the domain, especially for taskoriented discourses (Grosz, 1978); and from speaker intentions (Carberry, 1990; Litman and Hirschberg, 1990; Lochbaum, 1994). Recent methods for automatic recognition of discourse structure from text have incorporated thesaurus-based and other information retrieval techniques to identify changes in topic (Morris and Hirst, 1991; Yarowsky, 1991; Iwafiska et al., 1991; Hearst, 1994; Reynar, 1994). Parallel investigations on prosodic/acoustic cues to discourse structure have investigated the contributions of features such as pitch range, pausal duration, amplitude, speaking rate, and intonational contour to signaling topic change. Variation in pitch range has often been seen as conveying 'topic structure' in discourse. Brown et al. (1980) found that subjects typically started new topics relatively high in their pitch range and finished topics by compressing their range. Silverman (1987) found that manipulation of pitch range alone, or in conjunction with pausal duration between utterances, facilitated the disambiguation of ambiguous topic structures. Avesani and Vayra (1988) also found variation in pitch range in professional recordings which appeared to correlate with topic structure, and Ayers (1992) found that pitch range correlates with hierarchical topic structure more closely in read than spontaneous conversational speech. Duration of pause between utterances or phrases has also been identi1 Also called DISCOURSE MARKERS or DISCOURSE PARTICLES, these are items such as now, first, and by the way, which explicitly mark discourse structure.
منابع مشابه
The Prosody of Discourse Structure and Content in the Production of Persian EFL Learners
The present research addressed the prosodic realization of global and local text structure and content in the spoken discourse data produced by Persian EFL learners. Two newspaper articles were analyzed using Rhetorical Structure Theory. Based on these analyses, the global structure in terms of hierarchical level, the local structure in terms of the relative importance of text segments and the ...
متن کاملA Corpus-based Analysis on Prosody and Discourse Structure in Japanese Spontaneous Monologues
The aim of this paper is two folds. First, the paper attempts to investigate prosody and discourse structure in Japanese spontaneous monologues by using the prosodic labels of the Corpus of Spontaneous Japanese (CSJ). The analyses of F0 peak trends and prosodic breaks confirmed previous findings in [1]. Secondly, the paper attempts to evaluate the validity of prosodic labels of the X-JToBI syst...
متن کاملProsodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue
Theories of discourse structure hypothesize a hierarchical structure of discourse segments, typically tree-structured. While substantial work has been done on identifying and automatically recognizing the textual and prosodic correlates of discourse structure in monologue, comparable cues for dialogue or multiparty conversation, and in particular humancomputer dialogue remain relatively less st...
متن کاملDiscourse Structure in Spoken Language: Studies on Speech Corpora
A better understanding of the intonational charaeteristics of spoken discourse may lead to new empirical techniques for identifying discourse structure from speech, as well as new algorithms for enhancing the naturalness of synthetic speech. This paper summarizes results of pilot studies that demonstrate reliable correlations of discourse and speech properties, and reports findings on a new cor...
متن کاملUsing Prosody to Classify Discourse Relations
This work aims to explore the correlation between the discourse structure of a spoken monologue and its prosody by predicting discourse relations from different prosodic attributes. For this purpose, a corpus of semi-spontaneous monologues in English has been automatically annotated according to the Rhetorical Structure Theory, which models coherence in text via rhetorical relations. From corre...
متن کامل